Universal and idiosyncratic characteristic lengths in bacterial genomes.
نویسندگان
چکیده
In condensed matter physics, simplified descriptions are obtained by coarse-graining the features of a system at a certain characteristic length, defined as the typical length beyond which some properties are no longer correlated. From a physics standpoint, in vitro DNA has thus a characteristic length of 300 base pairs (bp), the Kuhn length of the molecule beyond which correlations in its orientations are typically lost. From a biology standpoint, in vivo DNA has a characteristic length of 1000 bp, the typical length of genes. Since bacteria live in very different physico-chemical conditions and since their genomes lack translational invariance, whether larger, universal characteristic lengths exist is a non-trivial question. Here, we examine this problem by leveraging the large number of fully sequenced genomes available in public databases. By analyzing GC content correlations and the evolutionary conservation of gene contexts (synteny) in hundreds of bacterial chromosomes, we conclude that a fundamental characteristic length around 10-20 kb can be defined. This characteristic length reflects elementary structures involved in the coordination of gene expression, which are present all along the genome of nearly all bacteria. Technically, reaching this conclusion required us to implement methods that are insensitive to the presence of large idiosyncratic genomic features, which may co-exist along these fundamental universal structures.
منابع مشابه
Species Specific DNA Profiling Mycobacterial Genomes Using Polymerase Chain Reaction with Single Universal Primer (UP-PCR)
Three tuberculous, twenty-one non-tuberculous mycobacterial (NTM) reference strains and seventy two isolates classified by biochemical tests were shown to produce specific sets of DNA fragments in a polymerase chain reaction with single universal primer (UP-PCR). A rather wide limit of tolerance for variations in procedure of PCR mixture preparation and thermocycling parameters was found. There...
متن کاملUniversal Lengths in Complete Microbial Genomes
Statistical analysis of frequency occurrence of short words in complete genomes reveals the existence of a set of universal lengths common to all extant complete microbial genomes. This phenomenon is consistent with a model for genome growth in which primitive genomes grew mainly by maximally stochastic duplications of short segments from an initial length of about 200 nucleotides. The relevanc...
متن کاملUniversal Lengths in Microbial Genomes and Implication for Early Genome Growth
We report the discovery of a set of universal lengths that characterize all microbial complete genomes. The Shannon information [Shannon 1948] of 108 complete microbial genomes relative to those of their respective randomized counterparts are computed and the results are summarized in a two-parameter exponential relation: Lr(k) = (42± 21)× 2.64, 2 ≥ k ≥ 10, where Lr is a ”root-sequence length” ...
متن کاملShannon information and self-similarity in whole genomes
The Shannon information (SI) in distributions of occurrence frequency of short words in whole genomes is shown to exhibit universality. For given word length, the SI in genomes of all lengths is the same as that in random sequences of a universal lengths Lr . For the shorter words Lr is far shorter than the genome. For example, Lr ∼ 1000 bases for three-letter words. We further show that whole ...
متن کاملFour basic symmetry types in the universal 7-cluster structure of 143 complete bacterial genomic sequences
Coding information is the main source of heterogeneity (non-randomness) in the sequences of bacterial genomes. This information can be naturally modeled by analysing cluster structures in the “in-phase” triplet distributions of relatively short genomic fragments (200-400bp). We found a universal 7-cluster structure in all 143 completely sequenced bacterial genomes available in Genbank in August...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Physical biology
دوره 15 3 شماره
صفحات -
تاریخ انتشار 2018